{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Building and deploying CNN models with Vertex AI pipelines: Challenge Lab"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This Challenge Lab is recommended for students who have enrolled in the [**Learning Tensorflow**]() quest. You will be given a scenario and a set of tasks. Instead of following step-by-step instructions, you will use the skills learned from the labs in the quest to figure out how to complete the tasks on your own! An automated scoring system (shown on the Google Skills lab instructions page) will provide feedback on whether you have completed your tasks correctly.\n",
    "\n",
    "When you take a Challenge Lab, you will not be taught Google Cloud concepts. To build the solution to the challenge presented, use skills learned from the labs in the Quest this challenge lab is part of. You are expected to extend your learned skills and complete all the **`TODO:`** comments in this notebook.\n",
    "\n",
    "Are you ready for the challenge?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Scenario"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You were recently hired as a Machine Learning Engineer for an Optical Character Recognition app development team. Your manager has tasked you with building a machine learning model to recognize Hiragana alphabets. The challenge: your business requirements are that you have just 6 weeks to produce a model that achieves great than 90% accuracy to improve upon an existing bootstrapped solution. Furthermore, after doing some exploratory analysis in your startup's data warehouse, you found that you only have a small dataset of 60k images of alphabets to build a higher-performing solution.\n",
    "\n",
    "To build and deploy a high-performance machine learning model with limited data quickly, you will walk through training and deploying a CNN classifier for online predictions on Google Cloud's [Vertex AI](https://cloud.google.com/vertex-ai) platform. Vertex AI is Google Cloud's next-generation machine learning development platform where you can leverage the latest ML pre-built components to significantly enhance your development productivity, scale your workflow and decision-making with your data, and accelerate time to value."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Vertex AI: Challenge Lab]()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, you will progress through a typical experimentation workflow where you will build your custom CNN model script using `tf.keras` classification layers. You will then send the model code to a custom training job and run the custom training job using pre-built containers provided by Vertex AI to run training and prediction. Lastly, you will deploy the model to an endpoint so that you can use your model for predictions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Learning objectives"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Train a model on Vertex AI using the SDK for Python.\n",
    "* Deploy a custom image classification model for online prediction using Vertex AI."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Installation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip3 install --user --upgrade google-cloud-aiplatform"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Install the latest GA version of *google-cloud-storage* library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip3 install --user --upgrade google-cloud-storage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Install the latest version of google-cloud-logging library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip3 install --user --upgrade google-cloud-logging"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Downgrade protobuf for tensorflow datasets compatibility"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip3 install --user protobuf==3.20"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Install the *pillow* library for loading images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip3 install --user --upgrade pillow"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Install the *numpy* library for manipulation of image data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip3 install numpy==1.24.4 --user"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can safely ignore errors during the numpy installation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Restart the kernel\n",
    "\n",
    "Once you've installed everything, you need to restart the notebook kernel so it can find the packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "if not os.getenv(\"IS_TESTING\"):\n",
    "    # Automatically restart kernel after installs\n",
    "    import IPython\n",
    "\n",
    "    app = IPython.Application.instance()\n",
    "    app.kernel.do_shutdown(True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Set your project ID\n",
    "\n",
    "**If you don't know your project ID**, you may be able to get your project ID using `gcloud`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "# Retrieve and set PROJECT_ID environment variables.\n",
    "# TODO: fill in PROJECT_ID.\n",
    "\n",
    "if not os.getenv(\"IS_TESTING\"):\n",
    "    # Get your Google Cloud project ID from gcloud\n",
    "    PROJECT_ID = \"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "PROJECT_ID"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Timestamp\n",
    "\n",
    "If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append it onto the name of resources you create in this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime\n",
    "\n",
    "TIMESTAMP = datetime.now().strftime(\"%Y%m%d%H%M%S\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a Cloud Storage bucket\n",
    "\n",
    "**The following steps are required, regardless of your notebook environment.**\n",
    "\n",
    "When you submit a training job using the Cloud SDK, you upload a Python package\n",
    "containing your training code to a Cloud Storage bucket. Vertex AI runs\n",
    "the code from this package. In this tutorial, Vertex AI also saves the\n",
    "trained model that results from your job in the same bucket. Using this model artifact, you can then\n",
    "create Vertex AI model and endpoint resources in order to serve\n",
    "online predictions.\n",
    "\n",
    "Set the name of your Cloud Storage bucket below. It must be unique across all\n",
    "Cloud Storage buckets.\n",
    "\n",
    "**Note:** Replace the <code>REGION</code> with the associated region mentioned in the lab resource panel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "REGION = \"us-central1\"  # @param {type:\"string\"}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: Create a globally unique Google Cloud Storage bucket name for artifact storage.\n",
    "# HINT: Start the name with gs://\n",
    "BUCKET_NAME = \"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! gcloud storage buckets create --location $REGION $BUCKET_NAME"   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up variables\n",
    "\n",
    "Next, set up some variables used throughout the tutorial."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Import Vertex SDK for Python\n",
    "\n",
    "Import the Vertex SDK for Python into your Python environment and initialize it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "\n",
    "from google.cloud import aiplatform\n",
    "from google.cloud.aiplatform import gapic as aip\n",
    "\n",
    "aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_NAME)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Set hardware accelerators\n",
    "\n",
    "Here to run a container image on a CPU, we set the variables `TRAIN_GPU/TRAIN_NGPU` and `DEPLOY_GPU/DEPLOY_NGPU` to `(None, None)` since this notebook is meant to be run in a Google Skills environment where GPUs cannot be provisioned. \n",
    "  \n",
    "Note: If you happen to be running this notebook from your personal GCP account, set the variables `TRAIN_GPU/TRAIN_NGPU` and `DEPLOY_GPU/DEPLOY_NGPU` to use a container image supporting a GPU and the number of GPUs allocated to the virtual machine (VM) instance. For example, to use a GPU container image with 4 Nvidia Tesla K80 GPUs allocated to each VM, you would specify:\n",
    "\n",
    "    (aip.AcceleratorType.NVIDIA_TESLA_K80, 4)\n",
    "\n",
    "See the [locations where accelerators are available](https://cloud.google.com/vertex-ai/docs/general/locations#accelerators)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "TRAIN_GPU, TRAIN_NGPU = (None, None)\n",
    "DEPLOY_GPU, DEPLOY_NGPU = (None, None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Set pre-built containers\n",
    "\n",
    "There are two ways you can train a custom model using a container image:\n",
    "\n",
    "- **Use a Google Cloud prebuilt container**. If you use a prebuilt container, you will additionally specify a Python package to install into the container image. This Python package contains your code for training a custom model.\n",
    "\n",
    "- **Use your own custom container image**. If you use your own container, the container needs to contain your code for training a custom model.\n",
    "\n",
    "\n",
    "Here you will use pre-built containers provided by Vertex AI to run training and prediction.\n",
    "\n",
    "For the latest list, see [Pre-built containers for training](https://cloud.google.com/vertex-ai/docs/training/pre-built-containers) and [Pre-built containers for prediction](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "TRAIN_VERSION = \"tf-cpu.2-8\"\n",
    "DEPLOY_VERSION = \"tf2-cpu.2-8\"\n",
    "\n",
    "TRAIN_IMAGE = \"us-docker.pkg.dev/vertex-ai/training/{}:latest\".format(TRAIN_VERSION)\n",
    "DEPLOY_IMAGE = \"us-docker.pkg.dev/vertex-ai/prediction/{}:latest\".format(DEPLOY_VERSION)\n",
    "\n",
    "print(\"Training:\", TRAIN_IMAGE, TRAIN_GPU, TRAIN_NGPU)\n",
    "print(\"Deployment:\", DEPLOY_IMAGE, DEPLOY_GPU, DEPLOY_NGPU)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Set machine types\n",
    "\n",
    "Next, set the machine types to use for training and prediction.\n",
    "\n",
    "- Set the variables `TRAIN_COMPUTE` and `DEPLOY_COMPUTE` to configure your compute resources for training and prediction.\n",
    " - `machine type`\n",
    "     - `e2-standard`: 16GB of memory per vCPU\n",
    "     - `n1-highmem`: 6.5GB of memory per vCPU\n",
    "     - `n1-highcpu`: 0.9 GB of memory per vCPU\n",
    " - `vCPUs`: number of \\[2, 4, 8, 16, 32, 64, 96 \\]\n",
    "\n",
    "*Note: The following is not supported for training:*\n",
    "\n",
    " - `standard`: 2 vCPUs\n",
    " - `highcpu`: 2, 4 and 8 vCPUs\n",
    "\n",
    "*Note: You may also use n2 and e2 machine types for training and deployment, but they do not support GPUs*."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "MACHINE_TYPE = \"e2-standard\"\n",
    "\n",
    "VCPU = \"4\"\n",
    "TRAIN_COMPUTE = MACHINE_TYPE + \"-\" + VCPU\n",
    "print(\"Train machine type\", TRAIN_COMPUTE)\n",
    "\n",
    "MACHINE_TYPE = \"e2-standard\"\n",
    "\n",
    "VCPU = \"4\"\n",
    "DEPLOY_COMPUTE = MACHINE_TYPE + \"-\" + VCPU\n",
    "print(\"Deploy machine type\", DEPLOY_COMPUTE)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training script\n",
    "\n",
    "In the next cell, you will write the contents of the training script, `task.py`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile task.py\n",
    "# Training kmnist using CNN\n",
    "\n",
    "import tensorflow_datasets as tfds\n",
    "import tensorflow as tf\n",
    "from tensorflow.python.client import device_lib\n",
    "import argparse\n",
    "import os\n",
    "import sys\n",
    "tfds.disable_progress_bar()\n",
    "\n",
    "parser = argparse.ArgumentParser()\n",
    "\n",
    "parser.add_argument('--epochs', dest='epochs',\n",
    "                    default=10, type=int,\n",
    "                    help='Number of epochs.')\n",
    "\n",
    "args = parser.parse_args()\n",
    "\n",
    "print('Python Version = {}'.format(sys.version))\n",
    "print('TensorFlow Version = {}'.format(tf.__version__))\n",
    "print('TF_CONFIG = {}'.format(os.environ.get('TF_CONFIG', 'Not found')))\n",
    "print('DEVICES', device_lib.list_local_devices())\n",
    "\n",
    "# Define batch size\n",
    "BATCH_SIZE = 32\n",
    "\n",
    "# Load the dataset\n",
    "datasets, info = tfds.load('kmnist', with_info=True, as_supervised=True)\n",
    "\n",
    "# Normalize and batch process the dataset\n",
    "ds_train = datasets['train'].map(lambda x, y: (tf.cast(x, tf.float32)/255.0, y)).batch(BATCH_SIZE)\n",
    "\n",
    "\n",
    "# Build the Convolutional Neural Network\n",
    "model = tf.keras.models.Sequential([                               \n",
    "      tf.keras.layers.Conv2D(16, (3,3), activation=tf.nn.relu, input_shape=(28, 28, 1), padding = \"same\"),\n",
    "      tf.keras.layers.MaxPooling2D(2,2),\n",
    "      tf.keras.layers.Conv2D(16, (3,3), activation=tf.nn.relu, padding = \"same\"),\n",
    "      tf.keras.layers.MaxPooling2D(2,2),\n",
    "      tf.keras.layers.Flatten(),\n",
    "      tf.keras.layers.Dense(128, activation=tf.nn.relu),\n",
    "      # TODO: Write the last layer.\n",
    "      # Hint: KMNIST has 10 output classes.\n",
    "      \n",
    "    ])\n",
    "\n",
    "model.compile(optimizer = tf.keras.optimizers.Adam(),\n",
    "      loss = tf.keras.losses.SparseCategoricalCrossentropy(),\n",
    "      metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])\n",
    "\n",
    "\n",
    "\n",
    "# Train and save the model\n",
    "\n",
    "MODEL_DIR = os.getenv(\"AIP_MODEL_DIR\")\n",
    "\n",
    "model.fit(ds_train, epochs=args.epochs)\n",
    "\n",
    "# TODO: Save your CNN classifier. \n",
    "# Hint: Save it to MODEL_DIR."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define the command args for the training script\n",
    "\n",
    "Prepare the command-line arguments to pass to your training script.\n",
    "- `args`: The command line arguments to pass to the corresponding Python module. In this example, they will be:\n",
    "  - `\"--epochs=\" + EPOCHS`: The number of epochs for training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "JOB_NAME = \"custom_job_\" + TIMESTAMP\n",
    "MODEL_DIR = \"{}/{}\".format(BUCKET_NAME, JOB_NAME)\n",
    "\n",
    "EPOCHS = 5\n",
    "\n",
    "CMDARGS = [\n",
    "    \"--epochs=\" + str(EPOCHS),\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Train the model\n",
    "\n",
    "Define your custom training job on Vertex AI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "job = aiplatform.CustomTrainingJob(\n",
    "    display_name=JOB_NAME,\n",
    "    requirements=[\"tensorflow_datasets==4.6.0\"],\n",
    "    # TODO: fill in the remaining arguments for the CustomTrainingJob function.\n",
    ")\n",
    "\n",
    "MODEL_DISPLAY_NAME = \"kmnist-\" + TIMESTAMP\n",
    "\n",
    "# Start the training\n",
    "model = job.run(\n",
    "    model_display_name=MODEL_DISPLAY_NAME,\n",
    "    replica_count=1,\n",
    "    accelerator_count=0,\n",
    "    # TODO: fill in the remaining arguments to run the custom training job function.\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deploy the model\n",
    "\n",
    "Before you use your model to make predictions, you need to deploy it to an `Endpoint`. You can do this by calling the `deploy` function on the `Model` resource."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "DEPLOYED_NAME = \"kmnist_deployed-\" + TIMESTAMP\n",
    "\n",
    "TRAFFIC_SPLIT = {\"0\": 100}\n",
    "\n",
    "MIN_NODES = 1\n",
    "MAX_NODES = 1\n",
    "\n",
    "endpoint = model.deploy(\n",
    "        deployed_model_display_name=DEPLOYED_NAME,\n",
    "        accelerator_type=None,\n",
    "        accelerator_count=0,\n",
    "        # TODO: fill in the remaining arguments to deploy the model to an endpoint.\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make an online prediction request\n",
    "\n",
    "Send an online prediction request to your deployed model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow_datasets as tfds\n",
    "import numpy as np\n",
    "\n",
    "tfds.disable_progress_bar()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "datasets, info = tfds.load('kmnist', batch_size=-1, with_info=True, as_supervised=True)\n",
    "\n",
    "test_dataset = datasets['test']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x_test, y_test = tfds.as_numpy(test_dataset)\n",
    "\n",
    "# Normalize (rescale) the pixel data by dividing each pixel by 255. \n",
    "x_test = x_test.astype('float32') / 255.\n",
    "x_test.shape, y_test.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#@title Pick the number of test images\n",
    "NUM_TEST_IMAGES = 20 #@param {type:\"slider\", min:1, max:20, step:1}\n",
    "x_test, y_test = x_test[:NUM_TEST_IMAGES], y_test[:NUM_TEST_IMAGES]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Send the prediction request"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Logging module added to log the prediction result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import and configure logging\n",
    "from google.cloud import logging\n",
    "logging_client = logging.Client()\n",
    "logger = logging_client.logger('challenge-notebook')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that you have test images, you can use them to send a prediction request."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: use your Endpoint to return prediction for your x_test.\n",
    "\n",
    "predictions = \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_predicted = np.argmax(predictions.predictions, axis=1)\n",
    "\n",
    "correct = sum(y_predicted == np.array(y_test.tolist()))\n",
    "total = len(y_predicted)\n",
    "\n",
    "logger.log_text(str(correct/total))\n",
    "\n",
    "print(\n",
    "    f\"Correct predictions = {correct}, Total predictions = {total}, Accuracy = {correct/total}\"\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "Qwiklab_Vertex_AI_CNN_fmnist.ipynb",
   "provenance": []
  },
  "environment": {
   "kernel": "python3",
   "name": "tf2-gpu.2-8.m104",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-8:m104"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
